Identification of a novel cuproptosis‐related gene signature for multiple myeloma diagnosis

Abstract Background Multiple myeloma (MM) ranks second among the most prevalent hematological malignancies. Recent studies have unearthed the promise of cuproptosis as a novel therapeutic intervention for cancer. However, no research has unveiled the particular roles of cuproptosis‐related genes (CRGs) in the prediction of MM diagnosis. Methods Microarray data and clinical characteristics of MM patients were obtained from the Gene Expression Omnibus (GEO) database. Differentially expressed gene analysis, least absolute shrinkage and selection operator (LASSO) and support vector machine‐recursive feature elimination (SVM‐RFE) algorithms were applied to identify potential signature genes for MM diagnosis. Predictive performance was further assessed by receiver operating characteristic (ROC) curves, nomogram analysis, and external data sets. Functional enrichment analysis was performed to elucidate the involved mechanisms. Finally, the expression of the identified genes was validated by quantitative real‐time polymerase chain reaction (qRT‐PCR) in MM cell samples. Results The optimal gene signature was identified using LASSO and SVM‐RFE algorithms based on the differentially expressed CRGs: ATP7A, FDX1, PDHA1, PDHB, MTF1, CDKN2A, and DLST. Our gene signature‐based nomogram revealed a high degree of accuracy in predicting MM diagnosis. ROC curves showed the signature had dependable predictive ability across all data sets, with area under the curve values exceeding 0.80. Additionally, functional enrichment analysis suggested significant associations between the signature genes and immune‐related pathways. The expression of the genes was validated in MM cells, indicating the robustness of these findings. Conclusion We discovered and validated a novel CRG signature with strong predictive capability for diagnosing MM, potentially implicated in MM pathogenesis and progression through immune‐related pathways.

Multiple myeloma (MM), which is the second most common hematological malignancy in high-income countries, is characterized by the uncontrolled accumulation of monoclonal plasma cells in the bone marrow. 1,24][5][6] The symptoms reported by MM patients on presentation are usually nonspecific and may have already been present for an extended period. 3,7symptomatic patients present with only laboratory abnormalities, such as anemia, renal disease, and elevated protein levels. 8,9Currently, no single test can reliably diagnose or monitor all MM patients.1][12] Early diagnosis and subsequent management of MM could improve patients' quality of life and reduce the symptom burden and serious complications of this severe disease. 1,3,13Therefore, the identification and establishment of new, reliable, and sensitive diagnostic biomarkers to detect the early stages of the disease is crucial for effective malignancy treatment.
Copper is one of the essential trace elements in the human body, which can also be noxious when its concentration exceed a certain threshold. 14"Cuproptosis" is a term coined to describe a recently discovered mode of mitochondrial cell death triggered by copper. 15Intracellular copper directly binds to lipoylated components of the tricarboxylic acid (TCA) cycle, which induces the aggregation of mitochondrial lipoylated proteins and destabilization of iron-sulfur cluster proteins, resulting in proteotoxic stress and ultimately cell death. 15The dyshomeostasis of copper has been associated with MM.7][18] Besides, a complex of copper and disulfiram was established to exert a significant inhibitory effect on the growth of myeloma cell lines. 19,20Previous reports have described the functional roles of cuproptosis-related genes (CRGs) in cancer development.For instance, CRGs were used to predict clinical outcome and immune response in hematological malignancies, including acute myeloid leukemia 21,22 and diffuse large B-cell lymphoma. 23The results of recent studies have emphasized the potential utility of CRGs in the development of prognostic models for MM. 24,25These findings indicate the existence of an association between CRGs and MM.However, the primary focus of these studies has been placed on the use of CRGs for the prediction of MM prognosis rather than on its diagnosis.
Considering the significance of early detection in identifying asymptomatic individuals who would benefit from timely intervention, in this study, we aimed to utilize CRGs for MM diagnostic prediction.We identified a CRG signature for MM diagnosis by analysis of differentially expressed genes, least absolute shrinkage and selection operator (LASSO), and support vector machine-recursive feature elimination (SVM-RFE) algorithms.Then, we validated our findings using nine external data sets.Furthermore, we conducted functional enrichment analysis to investigate the potential mechanisms.Additionally, the expression of the identified genes in MM cell samples was validated using quantitative real-time polymerase chain reaction (qRT-PCR).Our study discovered a novel role of CRGs in MM diagnosis and explored the underlying molecular mechanisms.Our findings are expected to contribute to more effective decisionmaking processes in establishing and implementing strategies for MM diagnosis and treatment.

| Data acquisition
Ten data sets (Supporting Information: Table S1) comprising MM and control samples, namely GSE5900, GSE6477, GSE24870, GSE27838, GSE46053, GSE113295, GSE113736, GSE118985, GSE133346, and GSE146649, were obtained from the Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo/).The raw data of all analyzed data sets were normalized to remove batch effects.The GSE118985 data set served as the training cohort for subsequent analyses, whereas the GSE5900, GSE6477, GSE24870, GSE27838, GSE46053, GSE113295, GSE113736, GSE133346, and GSE146649 data sets were selected to serve as diverse external validation cohorts for result verification.Based on our literature review, 15,[26][27][28][29] a total number of 19 CRGs (Supporting Information: Table S2) were subjected to analysis.expressed CRGs The expression data of CRGs in MM and control samples from the training cohort were extracted and subjected to differential expression analysis with the "LIMMA" package.Differentially expressed CRGs were required to meet the criterion of adjusted p < .05.The heatmap of the differentially expressed CRGs was visualized by the "pheatmap" package in R software.

| Identification of the optimal gene signature
LASSO is a regressive arithmetic method that ameliorates forecast accurateness for high-dimensional data and prevents overfitting during modeling. 30,31Moreover, SVM-RFE algorithm can effectively extract the information of the most relevant features through nonlinear kernels to screen out the best combination of variables. 32he optimal panel of predictive signature genes was identified by selecting the overlapping genes obtained from the LASSO and SVM-RFE methods based on the differentially expressed CRGs.Additionally, receiver operating characteristic (ROC) curves were constructed to assess the accuracy and specificity of these genes in diagnosing MM.The results of these analyses were visualized by the "glmnet," "e1071," "VennDiagram," and "pROC" packages.

| Nomogram construction
A nomogram was constructed based on the obtained gene signature to forecast the risk of disease.The predictive ability of the established model was evaluated using the calibration curve and decision curve analysis (DCA).The "rms" and "rmda" packages were employed to perform these analyses and visualize their results.

| Validation of the gene signature
The ability of the gene signature to differentiate between MM and controls was assessed in nine validation cohorts: GSE5900, GSE6477, GSE24870, GSE27838, GSE46053, GSE113295, GSE113736, GSE133346, and GSE146649, using ROC curve analysis.Visualization of the results was performed using the "glmnet" and "pROC" packages.

| Functional enrichment analysis
Gene set enrichment analysis (GSEA) software was employed to explore the potential mechanisms by identifying significantly enriched pathways for each signature gene using the gene set (Kegg.v7.4.symbols.gmt).Moreover, the CIBERSORT algorithm was applied to compare the levels of 22 distinct infiltrating immune cell types between the MM and control individuals.We performed Spearman correlation analysis to assess the correlation between the signature genes and the infiltrating immune cells.The results of these analyses were visualized by the "clusterProfiler," "enrichplot," "DOSE," "org.Hs.eg.db," "preprocessCore", "ggpubr," "tidyverse," "ggplot2," and "reshape2" packages.

| qRT-PCR
MM cell line H929 cells were commercially obtained from the American Type Culture Collection (ATCC).These cells were cultured in RPMI-1640 medium (Gibco) supplemented with 10% fetal bovine serum (Gibco) and 1% penicillin-streptomycin (Gibco).Peripheral blood mononuclear cells (PBMCs) were isolated from healthy donors using Ficoll-Hypaque density gradient centrifugation after obtaining informed consent.This study was approved by the Ethics Committee of Shanghai Yangpu Hospital (LL-009).All cells were incubated at 37°C in a humidified atmosphere with 5% carbon dioxide.Total RNA was extracted using TRIzol reagent (Invitrogen), following the manufacturer's protocol.Subsequently, complementary DNA was reverse-transcribed from 1 μg of total RNA using the PrimeScript™ RT Reagent Kit (Takara).qRT-PCR was performed in 96-well plates using the SYBR Green PCR Master Mix (KAPA) and the Applied Biosystems 7900HT Fast Real-Time PCR System (Thermo Fisher Scientific).Gene expression levels were determined using the ∆∆ 2 C − t algorithm, with ACTIN serving as the internal control.The primers utilized for qRT-PCR are presented in Table 1.

| Statistical analysis
Data analysis was conducted using R software (version 4.3.0)and GraphPad Prism software (version 8.0.1).For quantitative variables, differences between groups with normally distributed variables were analyzed using Student's t test, whereas Wilcoxon test was employed for skewed data.A two-sided p < .05 was considered to indicate a statistically significant difference.The significance levels are denoted as follows: *p < .05,**p < .01,and ***p < .001.Based on the criteria applied to the training cohort, 12 CRGs were identified as differentially expressed genes between the MM and control samples (Figure 2A, Supporting Information: Table S3).Compared with the normal controls, the expression levels of FDX1, LIPT1, PDHB, GLS, and CDKN2A were upregulated, whereas the expression levels of NLRP3, ATP7B, ATP7A, DLD, PDHA1, MTF1, and DLST were downregulated in the MM samples.Besides, the identified genes had significant synergistic effects, which are visualized in Figure 2B to display their interactions and interrelationships.

| Identification of the optimal gene signature
Different machine learning algorithms were employed to identify the optimal gene signature for MM diagnosis based on the differentially expressed CRGs.The LASSO algorithm was used to extract a total number of 11 genes (Figure 3A,B).The SVM-RFE algorithm was applied to filter 7 genes as the optimal feature gene combination (Figure 3C,D).As a result, seven overlapped genes (ATP7A, FDX1, PDHA1, PDHB, MTF1, CDKN2A, and DLST) were identified as the optimal CRG signature for diagnostic prediction in MM patients (Figure 3E).Moreover, ROC curves were generated to establish the diagnostic values of the signature genes for MM.As can be seen in Figure 3G, the area under the curve (AUC) values for all genes exceeded 0.6.We also constructed a logistic regression model utilizing the obtained gene signature for diagnostic prediction, with the AUC value was 0.887 (Figure 3G), indicating a high predictive value for MM.The results revealed that the gene signature-based model outperformed the individual genes in terms of diagnostic value for MM.

| Nomogram construction
We developed a nomogram utilizing the identified gene signature to predict the risk of MM (Figure 4A).The calibration curve showed the ideal agreement between the observed practical outcomes and the predicted risk probabilities (Figure 4B).DCA revealed that the constructed nomogram achieved a satisfactory benefit for clinical decision-making (Figure 4C).

| Validation of the gene signature
We validated our gene signature-based model using nine external data sets containing MM and control samples.Satisfactory model performance was achieved as determined by the ROC curves with AUC values of 0.905, 0.905, 0.959, 0.809, 0.848, 0.861, 0.924, 0.840, and 0.971 for the GSE5900, GSE6477, GSE24870, GSE27838, GSE46053, GSE113295, GSE113736, GSE133346, and GSE146649 data sets, respectively (Figure 5A-I).Therefore, the gene signature-based model was confirmed to have a strong MM diagnostic value in all the training and validation cohorts, with AUC values consistently exceeding 0.80.

| Functional enrichment analysis
The top six pathways enriched for each signature gene are shown in Figure 6A-G.Our findings revealed the enrichment of these genes in immune-related pathways (e.g., B-cell receptor signaling pathway, cytokine-cytokine receptor interaction, and intestinal immune network for immunoglobulin A production) and immune-related diseases (e.g., primary immunodeficiency, autoimmune thyroid disease, and graft-vshost disease).The results of the CIBERSORT algorithm showed significant differences in 17 of 22 infiltrating immune cells between the MM and control samples (Figure 6H).Additionally, our correlation analysis revealed significant associations between the signature genes and the infiltrating immune cells (Figure 6I).

| Construction of a ceRNA network
A ceRNA network was constructed using miRDB, TargetScan, miRanda, and spongeScan databases to investigate the regulation of the signature genes (Figure 7).The network consisted of 524 nodes (including 7 mRNAs, 257 miRNAs, and 260 lncRNAs) and 674 edges (Supporting Information: Table S4).Subsequently, qRT-PCR was used to validate the expression of the identified genes in the H929 and control cells.Expectedly, the expression levels of ATP7A, PDHA1, MTF1, and DLST in the MM samples were significantly lower than those in the control group, whereas the levels of FDX1, PDHB, and CDKN2A were significantly higher in the MM samples (Figure 8).It is worth noting that previous studies have reported the utilization of CRGs for prognostic prediction in MM. 24,25 However, given the significance of early diagnosis and effective management for improving patients' quality of life and reducing the burden of symptoms and complications, an exploration of the potential impact of CRGs on predicting the diagnosis of MM is imperative.The main discovery in this study is the successful identification and validation of a gene signature associated with CRGs for MM diagnosis, which was achieved by differentially expressed gene analysis, LASSO, and SVM-RFE algorithms.This signature identification can facilitate decision-making on the establishment and implementation of diagnostic strategies for MM in clinical practice.In this study, 12 of the 19 CRGs were identified as differentially expressed genes between the MM and control samples, underscoring the critical role of CRGs in the development and progression of MM.Leveraging these differentially expressed CRGs, we employed LASSO and SVM-RFE algorithms to screen the feature variables and construct an optimal diagnostic model.A total number of 11 and 7 genes were extracted using the LASSO and SVM-RFE algorithms, respectively.Ultimately, a gene signature consisting of seven overlapping genes (ATP7A, FDX1, PDHA1, PDHB, MTF1, CDKN2A, and DLST) was identified as the optimal cuproptosis-related signature for predicting MM diagnosis, with an AUC value of 0.887 in the training cohort.Additionally, the nomogram based on this gene model had high accuracy in predicting MM diagnosis, suggesting that the developed model may serve as a powerful diagnostic tool for MM.Validating the developed gene signature was performed on nine external data sets, confirming its robustness with consistently exceeding AUC values of 0.80.The identified signature genes have various roles.4][35] FDX1 is Validation of the gene signature.Receiver operating characteristic curves of the gene signature in the GSE5900 (A), GSE6477 (B), GSE24870 (C), GSE27838 (D), GSE46053 (E), GSE113295 (F), GSE113736 (G), GSE133346 (H), and GSE146649 (I) data sets, respectively.essential for the synthesis of various steroid hormones. 36,379][40] MTF1 acts as a transcription factor that induces the expression of metallothioneins and other genes involved in metal homeostasis in response to heavy metals. 41][44] DLST, a mitochondrial protein belonging to the 2-oxoacid dehydrogenase family, catalyzes the conversion of 2-oxoglutarate to succinyl-CoA and carbon dioxide. 45,46It is noteworthy that allelic loss of CDKN2A has previously been reported in MM patients. 47ln addition, the differential expression of the identified genes was validated in MM cell samples, increasing the robustness of our findings.However, the specific roles of the signature genes in the diagnosis of MM are not yet fully understood.
Functional enrichment analysis was used to investigate the potential mechanisms by which the signature genes contribute to the pathogenesis and progression of MM.GSEA revealed significant enrichment of immune-related pathways and diseases among the signature genes.Therefore, the CIBERSORT algorithm was employed to assess the disparities in the infiltration of 22 immune cells between the MM and control samples.The results showed that 17 of 22 infiltrating immune cells had significant differences.Among these, the MM group showed significant upregulation of regulatory T cells (Tregs) and mast cells.Tregs are a diverse population of CD4 + T cells with suppressive functions crucial in self-tolerance with different origins, phenotypes, and subtypes. 48Earlier studies showed that Tregs may be recruited and exploited by various tumor cells to evade immunosurveillance and eliminate protective antitumor immunity. 49,502][53][54][55] Besides, cytokines such as tumor necrosis factor-alpha and interleukin-10 secreted by mast cells are essential in promoting the immune tolerance mediated by Tregs which contribute to immune suppression and tumor promotion. 56,57Additionally, the vascular endothelial growth factor (VEGF) released by mast cells causes an angiogenic response and promotes tumor angiogenesis, which plays vital biological roles in the development of neoplastic disorders. 58,59everal studies have evidenced that the degree of mast cell infiltration parallels the severity of MM. [60][61][62] Additionally, the correlations between the identified genes and the infiltrating immune cells were calculated, revealing notable interactions between the signature genes and tumor immunity.4][65] These previous results are in agreement with the findings of the present investigation.Thus, it is reasonable to infer that the identified genes may support the nosogenesis and progression of MM by suppressing the immune response through immune-related pathways.Finally, gene regulation by the mRNA-miRNA-lncRNA ceRNA network could facilitate further exploration of the molecular mechanisms of MM.
In this study, we successfully identified and validated a novel signature consisting of CRGs, showcasing its potential as a promising biomarker for MM diagnosis.Importantly, the gene signature revealed AUC values exceeding 0.80 in all the data sets, solidifying the robust predictive capacity of our model.Moreover, the expression patterns of the identified genes remained consistent in the MM cell samples, further bolstering the potential for effective clinical application.Of note, the results of functional enrichment analysis indicated notable associations between the signature genes and pathways related to the immune system.However, some limitations of our study need to be clarified.Although the differential expression of the identified genes was validated by qRT-PCR analysis in our study, more prospective investigations are needed to validate the predictive power.In addition, the underlying molecular mechanisms of the genes were not fully understood, which will be the focus of our future research efforts.We identified a novel signature associated with CRGs, which showed a strong diagnostic value for MM with AUC values exceeding 0.80 across 10 data sets.
Functional enrichment analysis indicated that the signature genes might enhance the survival and progression of myeloma cells by influencing immune-related signaling pathways.Our study highlights the involvement of CRGs in the diagnosis of MM and investigates the underlying

Figure 1
Figure 1 depicts the flowchart of gene signature identification and validation, nomogram construction, and the subsequent analyses conducted in this study.

T A B L E 1
Abbreviation: qRT-PCR, quantitative real-time polymerase chain reaction.

F I G U R E 1
Flowchart of this study.GEO, Gene Expression Omnibus; LASSO, least absolute shrinkage and selection operator; qRT-PCR, quantitative real-time polymerase chain reaction; SVM-RFE, support vector machine-recursive feature elimination.

F I G U R E 2
Identification of differentially expressed cuproptosis-related genes (CRGs).(A) Heatmap of differentially expressed CRGs; (B) correlation plot of differentially expressed CRGs.F I G U R E 3 Identification of the optimal gene signature.(A, B) Least absolute shrinkage and selection operator (LASSO) algorithm to identify the optimal genes; (C, D) support vector machine-recursive feature elimination (SVM-RFE) algorithm for the identification of the optimal combination of feature genes; (E) overlapped genes based on the intersections of the LASSO and SVM-RFE algorithms; (F) receiver operating characteristic (ROC) curves of the signature genes in the training cohort; (G) ROC curve of the gene signature-based model in the training cohort.AUC, area under the curve; CI, confidence interval.

F I G U R E 4
Nomogram construction.(A) multiple myeloma risk prediction nomogram; (B) calibration curve; (C) decision curve analysis.

F
I G U R E 7 A competitive endogenous RNA network based on the identified biomarkers.mRNA, messenger RNA; miRNA, microRNA; lnRNA, long noncoding RNA.